Preprocessing

RegexpTokenizer

SnowballStemmer

Visualization

1. Visualize some important keys using word cloud

Creating Model

CountVectorizer

* Spliting the data

LogisticRegression

.* Logistic Regression is giving 96% accuracy, Now we will store scores in dict to see which model perform best

MultinomialNB

* MultinomialNB gives us 95% accuracy

* So, Logistic Regression is the best fit model, Now we make sklearn pipeline using Logistic Regression

*We get an accuracy of 96.5% on the test set